-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instance Evacuation prevent baseline re-computation after dropped #2740
Instance Evacuation prevent baseline re-computation after dropped #2740
Conversation
…re that baseline only changes once when EVACUATE IstanceOperation is set and not again after the instance is dropped from the cluster.
0bce512
to
e802c3b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall change looks good. i just want to make sure we are not duplicating information
...core/src/main/java/org/apache/helix/controller/dataproviders/BaseControllerDataProvider.java
Show resolved
Hide resolved
...core/src/main/java/org/apache/helix/controller/dataproviders/BaseControllerDataProvider.java
Show resolved
Hide resolved
Failed one flaky test: TestClusterAccessor.testGetClusters
The sets of clusters are actually the same but they are being compared as lists for some reason. They have different orders. Will raise a separate PR to fix this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for explaining offline, looks good.
This PR is ready to be merged! Final commit message: |
Issues
Description
We are now controlling the filtering of certain "Assignable" instances through the BaseControllerDataProvider. This will help centralize the logic and decouple it with the rebalancers. This PR migrates EVACUATE to this approach. Putting this logic here, allows methods like getAssignableInstanceConfigMap to respect the "Assignable" states. When the return value for this changes, a new baseline is computed. If it differs, which it will for evacuate, it will cause shuffling.
When the baseline is calculated, whether it is enabled, live, or set to EVACUATE is not respected. The assignment assumes the "happy path" where all of these instances are assignable or will be assignable. When and EVACUATE node is dropped, it will force a differing baseline to be calculated. This will cause shuffling after the evacuation is complete.
To fix this, setting and InstanceOperation to EVACUATE will ensure that it does not get returned in getAssignable*. This forces the baseline to be recomputed without considering the EVACUATE instance. When the instance is dropped, the arguments to baseline computation are the same as the previous computation and there is no change.
To maintain N -> N + 1 -> N behavior, we change the computeBestPossiblePartition state to pass all liveInstances so that the replicas on the EVACUATE instance can still be part of the stateMap until the new replica is created.
Tests
Changes that Break Backward Compatibility (Optional)
Documentation (Optional)
NA
Commits
Code Quality
(helix-style-intellij.xml if IntelliJ IDE is used)